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2  5  698s 


The  Hopfield  Model  and  Beyond 


Abstract 

The  standard  Hopfield  model  (both  digital  and  analog) 
and  algorithms  to  improve  its  performance  are  reviewed. 
An  analysis  of  the  model  and  the  modification  algorithms 
is  given.  Future  directions  for  continuous  models  which 
have  both  large  capacity  and  good  error-correcting  capa¬ 
bilities  are  examined. 
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The  Discrete  Hopfield  Model 


1.1  Introduction  to  the  Model 

In  1982,  Hopfield  [l]  proposed  a  neural  model  of  memory  storage  and  re¬ 
trieval  based  on  the  theory  of  spin  glasses  in  solid  state  physics  .  In  the 
model,  neurons  are  binary- valued  threshold  units,  taking  either  the  value 
0  or  1  in  one  version,  or  1  or  -1  in  an  alternative  version.  This  digital 
restriction  of  the  neurons  represents  the  neuron  in  two  possible  states-a 
1  represents  a  neuron  that  is  firing,  while  a  0  or  a  -1  ,  a  neuron  that  is 
inactive.  Mathematically,  this  corresponds  to  replacing  the  experimentally 
observed  neuronal  input-output  relationship,  a  graded  response  which  can 
be  characterized  by  a  sigmoid  function,  with  a  step-function.  The  neurons 
form  a  single  layer  and  are  completely  interconnected,  with  the  strength 
of  these  connections,  or  “synapses”,  given  by  a  correlation  matrix  formed 
from  the  memory  states  to  be  stored  in  the  system. 


WH  =  (1-1) 

«=i 

Once  the  layer  of  neurons  is  given  an  input,  that  is,  when  the  neurons 
are  set  to  some  initial  configuration  of  values,  the  neurons  are  updated 
asynchronously  and  in  a  random  order.  The  updating  procedure,  which  is 


PWJWJOT-Wyi w w  WJ  WV*  VWT.T.WWllVA'AW.vv-VA'JP.mwmsr.w  v  wv  v  <.•  ^ 


given  by:1 

N 

m -+ (52waili)  (L2) 

3  =  1 

amounts  to  a  relaxation  process.  2  This  follows  from  the  fact  that  the 
updating  procedure  minimizes  an  associated  energy  functional,  or  Liapunov 
Function.  The  energy  function  which  is  minimized  during  the  relaxation 
process  is: 

£»»  =  l1-3) 

That  £  is  a  Liapunov  function  may  be  demonstrated  as  follows. [l]  Note 
that  : 

N 

(1.4) 

3  =  1 

Also  observe  that  : 

A ^  >  0  =>  WijUj  >  0  =>■  A£  <  0  (1.5) 

3 

A ^  <  0  =>■  ^2,  wijlJ’3  <  0  =$►  A£  <  0 

3 

A  Hi  =  0  =>  A£  =  0 

Thus: 

A£  <  0,  and  A£  =  0  iff  A/x,  =  0  (1.6) 

When  this  observation  is  combined  with  the  fact  that  £  is  a  bounded  func¬ 
tion,  the  proof  that  the  relaxtion  process  will  descend  to  a  local  minimum 
is  complete. 

The  aim  of  the  Hopfield  model  is  the  categorization  of  an  input  state 
according  to  the  stored  state  to  which  it  is  most  similar.  Given  that,  if 
the  Hopfield  model  functioned  ideally,  the  stored  states  should  be  the  only 
minima  of  the  relaxation  process  and  should  divide  the  space  so  that  their 

1  Henceforth,  we  will  simply  write  the  function  8(arg.)  to  represent  28(arg.)  —  1  when 
we  are  dealing  with  neurons  whose  values  are  (+/-)1,  and  it  will  be  understood  to  mean 
the  standard  Heaviside  function  when  the  neurons  have  the  value  0  or  1. 

2The  original  model  had  spins  0  and  1  and  no  self-connections,  but  experimental  ev¬ 
idence  suggests  that  a  system  with  spins  (+/-)1  works  better  [l]  and  that  adding  the 
self-connections  improves  the  performance  still  more. [7] 
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radii  of  attraction  draw  the  input  state  to  the  stored  state  which  it  most 
resembles. 

In  its  original  form,  the  standard  Hopfield  model,  as  we  have  described 
it  above,  functions  very  poorly  as  a  categorizer  when  ( m/N )  >  0.1. 2  In  his 
1982  paper,  for  instance,  Hopfield  [l]  found  that  when  (m/N)stackrel~< 
.05  to  0.1,  the  stored  states  can  all  be  perfectly  recalled  when  presented 
as  input  states,  that  is,  they  are  fixed  points,  or  minima,  of  the  relaxation 
process.  The  radius  of  attraction  is  also  reasonable  in  this  range  of  (m/N). 
To  be  concrete,  Hopfield  found  for  (m/N)  =  0.05  with  N  =  30,  90  percent  of 
the  random  starting  states  within  a  radius  of  5  hamming  units  of  the  stored 
states  relaxed  to  the  target  stored  state.  When  (m/N)  is  above  the  range 
0.05  to  0.1,  the  attractive  capability  of  the  stored  states  decays  rapidly,  and 
the  percentage  of  stored  states  which  are  minima  also  quickly  declines.  For 
instance,  as  a  benchmark,  at  (m/N)  =  0.15,  Hopfield  found  only  half  of  the 
stored  states  were  fixed  points. 

1.2  Methods  to  Improve  the  Performance  of 
Hopfield’s  Model 

1.2.1  “Unlearning” 

Given  the  limitations  of  the  original  model,  improvements  have  been  sought. 
An  early  approach  first  tried  by  Hopfield  [2]  has  been  given  by  him  the  name 
of  “unlearning”,  after  a  term  first  coined  by  Crick  to  explain  the  biological 
purpose  of  sleep  in  humans  and  animals  as  a  period  during  which  unneeded 
information  is  erased  and  stored  information  compacted.  In  Hopfield’s  al¬ 
gorithm,  a  random  state  is  relaxed  to  a  stable  state  (often  a  spurious  attrac¬ 
tor),  a  correlation  matrix  is  formed  from  this  state,  and  then  an  ammount 
proportional  to  this  is  subtracted  from  the  original  matrix: 

wij  — >  Wij  -  a/X,  Ilf  (1.7) 

The  operation  is  repeated  k  times.  In  simulations  of  Hopfield’s  “unlearn¬ 
ing”,  Terry  Potter  [7]  has  found  that  k  is  optimal  for: 

(1.8) 

2m  =  the  number  of  stored  states.  N  =  the  number  of  neurons. 


With  “unlearning”  the  number  of  stored  states  that  can  be  correctly  re¬ 
called  approaches  the  dimensionality,  N,  and  error  correction  is  improved 
but  falls  to  zero  as  m  — +  JV.[7] 


1.2.2  An  Alternative  Approach  to  Improvement  of 
the  Hopfield  Model 

Recently,  an  interesting  variation  of  Hopfield’s  “unlearning”  has  been  stud¬ 
ied  experimentally  by  Terry  Potter. [7]  The  algorithm  is  a  hybrid  combining 
elements  of  Hopfield’s  “unlearning”  with  a  modification  reminiscent  of  the 
Widrow-Hoff  algorithm  from  Linear  Filter  Theory.  As  a  quick  review,  we 
present  the  rudiments  of  the  Widrow-Hoff  algorithm  and  then  examine 
which  elements  of  it  have  been  carried  over  into  Potter’s  algorithm. 

The  goal  of  the  traditional  Widrow-Hoff  algorithm  is  to  associate  pairs 
of  real-valued  input,  xk,  with  real-valued  output,  y*.  A  linear  input-output 
function,  represented  by  a  matrix  A,  is  postulated.  The  algorithm  converges 
to  an  optimal  matrix  A*  by  minimizing  the  mean-square  error  between  the 
target  state  and  the  actual  output  of  the  matrix  A.  For  a  proof  of  this,  see 
T.  Kohonen(l974).[6]  In  practice,  the  derivative  of  the  partial  error  due  to 
the  kth  pattern: 

f)Kk 

—  =  -2 (y?  -  (Ax*),)**  (1.9) 

is  used  as  an  approximation  to  the  derivative  of  the  total  error  in  a  gradient 
descent  algorithm,  which  has  the  form: 

dEk 

A,y->A„-a—  (1.10) 

The  patterns  are  presented  successively,  and  a  modification  is  made  at  each 
step.  The  motivation  for  such  an  algorithm  can  be  depicted  graphically: 
The  graph  displays  the  error  Ek  versus  the  matrix  element  A,y;  the  algo¬ 
rithm  chooses  at  each  step  a  better  approximation  to  the  global  minimum 
by  generating  a  matrix  A  which  is  a  better  approximation  to  the  minimum 
of  Ek.  In  the  limit  as  the  number  of  presentations  increases,  the  algorithm 
approaches  the  global  minimum  asymptotically,  provided  that  a  is  chosen 
sufficiently  small  [6]. 

Having  described  the  original  Widrow-Hoff  algorithm,  let  us  now  ex¬ 
amine  how  this  has  been  woven,  along  with  Hopfield’s  “unlearning”,  into 
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a  hybrid  algorithm  by  Potter.  There  are  actually  two  versions  of  the  al¬ 
gorithm.  In  one  version,  following  a  suggestion  by  Professor  Cooper,  the 
stored  states  are  used  as  the  input  states  in  the  relaxation  process;  for  a 
given  stored  state,  modification  is  done  only  if  that  state  is  unstable.  In  a 
second  version  of  the  algorithm,  all  of  the  states  at  a  radius  of  one  hamming 
unit  from  each  stored  state  are  relaxed;  a  modification  is  made  if  the  target 
state  misses  its  target  stored  state. 

The  similarity  of  Potter’s  algorithm  to  that  of  Widrow  and  Hoff  lies  in 
the  actual  form  of  the  modification  used  once  a  state,  elected  as  described 
above,  is  relaxed: 


vrvr* + 1) 


(i.n) 


By  analogy  with  the  original  Widrow-Hoff  equation,  1.10,  the  result  of  the 
entire  relaxation  process  jj,r.elaxed  now  replaces  the  output  of  Widrow-Hoff’s 
linear  input-output  function,  (A£fc)<.  One  should  note,  however,  that  the 
form  is  not  exactly  the  same  as  the  original  Widrow-Hoff  because  of  the 
presence  of  the  factor  Hj+1.  This  additional  factor  and  the  symmetry  of  the 
modification  in  the  indices  i  and  j  establishes  an  additional  restriction  on 
the  modification  criteria  mentioned  above:  no  modification  will  be  made  to 
u;;7  (and  w#)  unless  either  one  or  both  of  the  ith  and  jth  elements  of  the  input 
state  has  the  value  one.  Potter  has  found  that  this  additional  proviso  is  an 
important  factor  in  the  effectiveness  of  his  algorithm.  There  is  one  more 
essential  difference  and  that  is  that  the  symmetry  of  the  synaptic  matrix  is 
preserved  by  making  the  same  modification  to  w7>-  each  time  a  modification 
is  performed  on  the  element  This  ensures  that  the  relaxation  algorithm 
will  continue  to  descend  monotonically:  in  order  to  avoid  the  possibility  of 
limit  cycles,  the  symmetry  of  the  matrix  must  be  preserved,  [l] 

As  an  overview  of  the  algorithm,  then,  the  relaxation  process  as  a  whole 
has  been  embedded  in  place  of  the  output  of  the  linear  operator  in  the 
original  Widrow-Hoff  theory,  the  overall  morphology  of  the  modification 
has  been  modified  slightly,  and  additionally,  symmetric  modifications  have 
been  added. 

In  simulations  carried  out  by  Potter  [7]  using  the  stored  states  as  the 
input  for  the  modification,  he  was  able  to  achieve  ~  N2  stable  stored  states 
(fixed  points).  3  At  this  density  of  stored  states,  the  radius  of  attraction, 


3a  greater  number  of  states  was  not  attempted 
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however,  can  not  be  completely  gauranteed  for  even  a  radius  of  one  ham¬ 
ming  unit:  at  a  radius  even  so  small  as  this,  the  percentage  of  states  which 
converge  to  the  target  stored  state  is  ~  40  to  60  percent.  3 

For  simulations  in  which  Potter  used  all  of  the  input  states  at  a  radius 
of  one  unit  from  each  of  the  stored  states,  he  found  that  for  m  just  below 
the  dimensionality,  N  ,  a  radius  of  attraction  of  one  unit  hamming  unit 
could  be  completely  gauranteed.  Above  the  dimensionality,  the  radius  of 
attraction  and  the  percentage  of  stable  stored  states  decays. 

In  summary,  then,  with  Potter’s  algorithm,  the  capacity  of  the  Hop- 
field  model  to  generate  stable  stored  states  can  be  vastly  improved,  but 
with  no  radius  of  attraction.  Conversely,  with  the  alternative  version  of  his 
algorithm,  confining  the  capacity  of  stored  states  to  be  just  below  the  di¬ 
mensionality,  only  a  severely  limited  radius  of  attraction  c  in  be  constructed 
around  the  stored  states. 

1.3  Some  Analysis  of  the  Hopfield  Model 
and  Potter’s  Algorithm 

To  begin  with,  we  note  that  in  the  standard  Hopfield  model,  N  stable  stored 
states  could  have  been  achieved  if  the  coding  of  the  stored  states  had  been 
selected  so  that  all  of  the  stored  states  were  mutually  orthogonal.  The  proof 
is  as  follows.  Suppose  we  choose  m  mutually  orthogonal  stored  states.  Label 
them  by  the  index  s:l*--m  <  N.  Now,  examine  what  happens  when  the 
relaxation  process  is  applied  to  one  of  the  stored  states  s’: 

Mi'  -» 

3  =  1 
m 

=  «(£>;(?• -m*')) 

»= i 

=  «(n4) 

=  lif  (M2) 

Thus,  no  matter  which  neuron  is  sampled  by  the  updating  procedure,  it 
returns  the  same  value  for  the  neuron.  Thus,  all  of  the  stored  states  are 

3provided  the  self-connections  are  used;  if  the  self-connections  are  removed  when  the 
synapses  are  formed,  the  performance  at  this  radius  is  even  worse. [7] 


stable. 

Professor  Cooper  has  pointed  out  that  even  in  a  linear  system,  we  should 
expect  this  result.  The  argument  is  as  follows.  Suppose,  in  such  a  system 
we  choose: 


wa  =  E 


=i  •  na 


(1.13) 


where  the  vectors  5  :  l---m  <  TV,  are  mutually  orthogonal.  Such  a 
set  can  always  be  generated  from  a  linearly  independent  set  of  states  via 
a  Gram-Schmidt  process.  We  can  rewrite  the  matrix  w,;  in  terms  of  an 
orthonormal  set  of  vectors  xs  as: 


WH  -  E  x*x: 


S~1 


If  the  state  Xs'  is  now  presented,  then  the  output  is: 

N  m 

{w  ■  F'),  =  y,  E  xixixi 

i=i 4=1 


»=1 


=  Xi  » 


(1.14) 


(1.15) 


and  the  stored  state  is  perfectly  recalled. 

Let  us  now  return  to  our  analysis  of  the  Hopfield  model  and  examine 
the  relaxation  process  when  the  states  are  not  orthogonal.  Given  this,  if 
we  relax  one  of  the  stored  states,  we  find  in  the  first  iteration,  or  in  any 
iteration  prior  to  which  no  neuron  has  changed  its  value: 

m 

#**’  0(E™.y Pj) 

3  =  1 
m  N 

5—1  j  =  1 

=  «(£„•(, r  ■  rt 
8-  1 
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Note  that  if  the  second  term  in  the  argument,  Hi{Ua  *Ma  )>  is  larger  in 
magnitude  than  N nf  and  of  opposite  sign,  then  the  spin  will  be  flipped  and 
the  stored  state  s’  will  be  unstable.  This  second  term,  arising  from  state  to 
state  interaction,  thus,  can  be  viewed  as  “noise”,  which  competes  with  the 
“signal” ,  JV/x* .  It  is  this  “noise”  which  causes  the  standard  Hopfield  model 
to  function  so  poorly.  As  a  very  crude  estimate  of  this  “noise”  term,  note 
that: 

ltrt(?-/)l<(m-l)IV.  (117) 

»=1 

As  an  example,  with  m  =  .IN  and  N  =  100,  this  bound  is  9N.  Given  this, 
it  is  easy  to  see  how  random  fluctations  in  the  selection  of  the  stored  states 
initially  could  easily  generate  a  “noise”  term  large  enough  to  swamp  the 
“signal.” 

It  is  logical  to  ask  how  the  above  analysis  eflects  error  correction.  We 
are  led  to  consider,  therefore,  the  relaxation  process  for  a  more  general 
state,  that  is,  for  one  of  the  stored  states  with  some  error: 

t  =  fi,'  +  e.  (1.18) 

Here,  fi*  is  the  target  state  and  e  is  the  error  in  the  state  at  some  point 
during  the  relaxation  process.  With  this  definition,  if  the  ith  neuron  is  the 
next  neuron  selected  by  the  updating  procedure,  then,  we  have: 

N 

Hi  °(Yl  WHH}) 

;=i 

N  m 

=  +  ej)  ) 

y=i *=i 

m 

=  »<  Nrf  +  £  0.V  •  i?)  +  £  MiV  •  c)  ).  (1.19) 

«  =  1 

The  situation  is  more  complicated  than  before  because  there  are  now  two 
“noise”  terms: 


•»> 


»?2  =  •  e)- 


(1.20) 


r)i  is  just  the  state-state  interaction  due  to  the  non-orthogonality  of  the 
states,  as  previously  discussed.  r)2  is  the  interaction  of  the  error  vector 
with  all  of  the  states.  This  more  complicated  “noise”  term  competes  with 
the  “signal”  and  is  responsible  for  the  poor  error-correction  ability  of  the 
standard  Hopfield  model. 

In  light  of  the  above  analysis,  we  now  want  to  examine  how  Potter’s 
modification  algorithm  helps  to  improve  the  standard  Hopfield  model  and 
point  out  some  of  the  reasons  for  its  limitations.  We  consider  the  version  of 
Potter’s  algorithm  which  tests  the  stored  states.  To  begin  with,  we  demon¬ 
strate  the  effect  of  a  single  modification  and  then  discusss  the  consequences 
when  multiple  modifications  are  effected. 

A  single  modification  due  to  the  relaxation  of  the  stored  state  s’  will 
yield: 

wn  ->  =  Wij  +  a{n’'  -  Mi'VjVy  +  l)  (1-21) 

Now,  let  us  see  th<'  effect  on  the  relaxation  of  jx1'  with  the  matrix  w--.  We 
examine  a  step  in  the  relaxation  process  before  any  neuron  has  changed 
its  value  and  find  that,  now,  if  the  ith  neuron  is  sampled,  the  updating 
procedure  will  give: 

<*,•'  -  9(E  <»,>;') 

3  =  1 

=  »( N/if  +  *({?})  +  !>(„•'  -  M  )•  (i-m) 


If  we  use  the  fact  that  /z*  =  1,  then  this  reduces  to: 

pf  -  H  +  v({F})  +  (n  +  y: -  p;-')  ). 

3  =  1 


(1.23) 


If  the  states  are  stored  with  a  random  but  balanced  code,  4  5  then  the  term 

4A  balanced  code  is  one  with  an  equal  number  of  l’s  and  -l’s,  or  0’s  (if  the  “off’’  state 
is  represented  by  a  0). 

5 Experimentally,  a  random,  balanced  code  performs  3  to  4  times  better  than  a  random 


*  /*»  V’  -  * 

,  .  *r—  at...  . 
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=  0.  We  assume  this  is  true  and  observe  the  relationship  between 

the  “signal”  term,  Nfi? ,  the  “noise”  term,  T)({f2‘}),  and  the  modification 

term,  JVa(/x*’  -  Hi'')-  Note  that  if  /x-*’  =  h?  ,  then  the  modification  term 

is  identically  zero,  which  is  the  desired  result,  since  this  corresponds  to  the 

situation  in  which  the  “signal”  was  stronger  than  the  “noise”.  If  /x,*'  >  Hi , 

then  Hi''  —  1  and  Hi  =  —  1  =>  f?({/x5})  >  0  >  AT/x*',  and  further:  |  N fi’  |<| 

r]  |,  but  Na(nf  —  Hi')  <  0,  so  that  provided  a  is  chosen  large  enough, 

i.e.  a  >  If  (//  >  n!' ,  the  same  result  will  obtain  provided  a  is 

JVIm?  -m<*  I 

again  chosen  larger  than  the  same  limit.  This  follows  from  the  fact  that 
Na(Hi  —  /x*r*')  has  to  have  the  same  sign  as  the  “signal”  term  and  the 
opposite  sign  to  that  of  the  “noise”  term,  so  that  provided  a  is  sufficiently 
large,  i.e.  above  the  stated  limit,  the  ith  neuron  will  be  stable.  We  can 
gaurantee  that  the  whole  state  will  be  stable  if  we  chose: 


\  N-\n\\ 

N  mini  |  M,-'  -  Mi' 


(1.24) 


The  limitations  of  this  approach  lie  in  the  fact  that  modifications  which 
aid  in  the  stability  of  one  stored  state  may  hinder  the  stability  of  other 
stored  states.  Clearly,  the  fact  that  out  to  N 2  stable  stored  states  have 
already  been  achieved  with  this  algorithm  attests  to  the  fact  that  despite 
this  competition  of  modifications,  there  is  an  overall  averaging  effect  which 
achieves  a  substantial  increase  in  the  capacity  to  store  stable  memory  states. 

We  have  outlined  this  argument  in  terms  of  a  “signal”,  which  is  the 
only  term  present  for  orthogonal  memory  states,  a  “noise”,  which  is  due  to 
the  lack  of  orthogonality,  and  a  modification  term.  What  the  modification 
term  achieves,  then,  is  an  effective  orthogonaliztion  of  the  states  with  re¬ 
spect  to  the  non-linear  updating  procedure,  by  eliminating  the  effect  of  the 
noise.  In  a  linear  system,  as  pointed  out  earlier,  we  could  create  up  to  N 
perfectly  recallable  stored  states.  Because  the  updating  procedure  in  the 
Hopfield  model  is  nonlinear,  an  effective  orthogonalization  with  respect  to 
that  nonlinearity  can  achieve  well  above  the  dimensionality. 

As  we  have  discussed  earlier,  however,  error  corresction  is  non-existent 
at  ~  N2.  When  states  with  error  are  input,  f?({/x*})  - >  rz( {/Is )■ , e) ,  and  the 
error  term  is  apparently  then  so  large  for  a  number  of  the  neurons  that  the 
effect  of  the  modifications  is  inconsequential. 

but  unbalanced  code 


%  N  .•» 


\  \  % 


’  •  v**1  *  *•  »* 
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The  same  analysis  applies  if  the  second  version  of  Potter’s  algorithm 
is  used-that  of  probing  all  of  the  states  one  Hamming  unit  away  from  the 
stored  states,  except  that  in  the  modification  term,  p,‘  and  /T*'  should 
now  be  thought  of  as  the  probe  states  and  their  respective  relaxed  states. 
A  radius  of  attraction  of  one  Hamming  unit  could  be  gauranteed  for  m 
just  below  the  dimensionality,  but  no  better.  Again,  it  is  the  combination 
of  the  size  of  the  “noise”  term  and  the  competition  between  the  various 
modifications  which  causes  such  limitations  to  arise. 

To  provide  an  alternative  perspective  on  the  Potter  algorithm,  we  con¬ 
sider  it  from  the  point  of  view  of  the  energy  functional.  The  energy  of  a 
state  ft  is  given  by  the  quadratic  form  in  1.3.  Suppose  we  use  the  version  of 
Potter’s  algorithm  in  which  the  modifications  are  based  on  the  relaxation 
of  the  stored  states.  Denote  the  energy  of  jz  before  the  modifications  by 
Then,  after  the  modification  procedure,  we  have: 

u  =  <;-;£*  f>;  -  awm  + 1  )h 

z  a  »=i 

I  tn  N 

=  “  9  £(  (£•£*)  -  (m  *  /**■')  )(Z>,  +  (£*  •/*)  ),  (1-25) 

*  ,=i  j= l 

using  the  fact  that  (/z4)2  =  1.  We  will  assume  a  balanced  code  for  the 
stored  states.  Let  us  consider  the  energy  of  one  of  the  stored  states,  s’,  and 
then  examine  the  region  of  the  hypercube  nearby  the  state  s’.  For  /z  =  /z4  , 
the  energy  equation  1.25  reduces  to: 

e>  =  {”-■  -  \  fx  (?'  ■  p)  -  (?'  •  F- ) )  (  v  •  m ) ,  (i.26) 

£  <=i 

(where  we  have  assumed  a  balanced  code).  Consider  the  factor  ( jla '  ■?).  It 
will  obviously  be  largest  for  s=s’.  The  prefactor  of  this  term  is  N  —  (/z5’  •  /?”•' ) 
which  is  >  0  and  is  zero  iff  the  state  s’  was  stable  before  the  modifications. 
Thus,  if  s’  was  unstable  before  the  modification,  this  modification  term 
will  lower  the  energy  of  the  state  s’.  This  is  depicted  graphically  here: 
Terms  in  the  sum  due  to  nearby  stored  states  may  in  fact  reverse  this 
process,  particularly  if  their  density  is  high  (perhaps  m  >  N2).  This  is 
possible  because  fT‘  for  these  nearby  stored  states  will  have  a  probability 
of  being  closer  to  /z4  than  /z4  is;  when  this  situation  arises,  the  prefactor 


Figure  1.3:  .  Instability  of  state  s’  causes  the  largest  modification  term 

to  contribute  to  lowering  the  energy  of  state  s’:  dashed  line 
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(/T 3'  -jX3)  —  (jX3'  •jlr‘)  will  be  negative.  This  is  depicted  graphically  for  jX3'  and 
one  of  its  neighboring  states  jX3:  It  is  these  competing  nearby  states  which 
provide  some  upper  bound  >  N 2  on  the  number  of  stable  stored  states 
obtainable,  because  if  there  are  too  many  nearby  neighbors,  the  algorithm 
can’t  lower  the  energy  of  the  stored  state.  We  caution,  however,  that  this 
energy  perspective  has  some  limitations:  the  presence  of  a  gap  in  the  energy 
is  only  a  neccesary  and  not  a  sufficient  condition  for  the  system  to  descend 
through  that  gap:  the  reason  for  this  lies  in  the  discrete  nature  of  the 
jumps;  the  system  can  get  caught.6  7 

The  energy  perspective  if  /X  is  not  one  of  the  stored  states  but  rather 
nearby  one  of  the  stored  states  is  more  complicated.  This  stems  from  both 
the  competition  of  modification  terms  due  to  nearby  states  and  from  also 
from  the  fact  that  the  largest  term  in  the  sum  may  now  have  the  wrong 
sign:  jX  may  now  be  closer  to  than  fX3  .  In  addition,  modification  terms 
due  to  other  states  targeting  the  same  stored  state  may  have  the  wrong 
sign  for  the  same  reason.  As  a  consequence,  we  would  expect  the  ability  of 
Potter’s  algorithm  to  improve  error-correction  in  the  Hopfield  model  to  be 
quite  limited,  and,  indeed,  as  pointed  out  earlier,  this  is  exactly  what  he 
has  observed  experimentally. 

1.4  Some  Alternative  Energy  Functions 

Among  possible  methods  of  improving  the  Hopfield  model,  the  embedding 
of  different  energy(Liapunov)  functions  via  the  updating  procedure  poses  a 
promising  alternative  to  iterative  modifications  procedures,  like  “unlearn¬ 
ing”  or  Potter’s  algorithm,  which  aim  at  simply  optimizing  the  matrix  in 
the  quadratic  form  of  equationl.3.  Among  the  options  that  we  have  con¬ 
sidered  is  the  possibility  of  adding  a  cubic  term  to  the  energy  function,  so 
that  fa  would  take  the  form: 

ifi  =  Y,  ~  Y,  V-iV-iVkqak  (1-27) 

^  '•>  i.i.k 

6For  tli is  reason,  one  should  not  be  misled  by  the  fact  that  in  the  above  schematic 
diagrams,  we  have  drawn  continuous  lines  connecting  the  states. 

7 The  claim  can  be  demonstrated  by  working  through  some  simple  three-dimensional 
examples  of  the  Hopfield  model.  See  the  three-dimensional  examples  in  the  Appendix 
section  of  this  report. 


VMUPiKlUWU] 


with  Wij  =  YLriVj  and  qijk  = 

»=1  a— 1 

By  choosing  the  associated  updating  procedure  to  be  of  the  form: 

N  ! 

Mt  >  (  XZ  n  Xv  QijkftjUk  )i 

}=1  Z  }k 


(1.28) 


the  algorithm  is  gauranteed  to  descend  monotonically.  The  definition  of  ql]k 
is  chosen  to  be  analogous  to  that  of  :  it  corresponds  to  a  measure  of  the 
correlation  between  all  triples  of  neuronal  values  /x,,  /x;-,  and  /x*  in  the  stored 
states  s  :  1  •••m.  While  this  modified  energy  function,  may  prove  to  work 
better  than  the  original  model,  the  amount  of  additional  computational 
complexity  involved  in  the  relaxation  process  may  not  be  worth  the  gain. 
Further,  modifying  the  quadratic  and  cubic  tensors  also  requires  a  quite 
significantly  longer  amount  of  time  if  the  updating  procedure  is  involved  in 
the  optimization.  Finally,  there  is  some  question  as  to  whether  the  form 
of  the  cubic  order  coupling  constant  is  really  the  best  form.  The  question 
is  raised  because  of  the  fact  that  if  the  ith,  jth,  and  kth  neurons  all  have 
the  value  -1  in  a  given  state  then  their  product  is,  of  course,  negative:  this 
contradicts  the  notion  of  correlation  as  Potter  has  pointed  out.8 

Another  alternative,  which  has  proved  to  be  very  promising  in  some 
preliminary  investigations  in  three  dimensions,  9  is  the  following  energy 
functional: 

£*  =  +  «£(£’  ’?'))  (L29) 

The  first  term  is  the  original  Ilopfield  term;  the  second  term  contributes 
most  when  nearby  states  are  close.  This  can  be  rewritten  as: 


where: 


■jE(  PiViiN  +  otoQii  )> 


mm 

Qa  =  £Q>: !»;'). 

9=1  9=1 


(1.30) 


(1.31) 


8T.  Potter,  personal  communication 

9See  the  three-dimensional  examples  in  the  Appendix 
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A  descent  algorithm  can  be  gauranteed  if  we  choose: 


m.  -*  o{  )■  (L32) 

j= 1  ^ 

This  ammounts  to  a  special  recipe  for  choosing  biases,  since: 

*-?£<*,  t1-33) 

^  i=i 

is  just  an  offset.  Hopfield[l]  employed  a  special  choice  of  biases  in  his 
original  model  with  spin  values  0  and  1  and  found  that  it  made  the  model 
perform  as  well  as  when  the  model  was  run  with  spins  ^T,  and  no  offset; 
no  formula  for  the  choice  of  biases  was  given  in  his  article,  however.  We 
are  suggesting  that  a  special  choice  of  biases  for  spins  (+/-)l  could  improve 
matters  even  more,10  and  are  proposing  the  above  form  as  an  appropriate 
means  of  making  this  choice.  In  the  three-dimensional  examples  which  we 
have  worked  in  the  Appendix,  we  found  that  a  must  be  chosen  as  a  function 
of  the  stored  states:  11  12 

a  =  /({£'•  £''})•  (1-34) 


10See  Apjwidix. 

11  For  tJbe  pwiicular  examples  worked  in  three  dimensions  witk  two  stored  states,  pl 
and  jpy  we  found  that  a  =  5 (p1  ■  p2)  —  1. 

12The  explicit  form  has  yet  to  be  derived,  but  our  explorations  ijt  three  dimensions  in 
the  Appendix  sa&gest  that  a  good  choice  should  probably  be  a  function  of  how  close  the 
nearest  neighbors  are.  For  this  reason  we  have  chosen  the  form  above. 


Chapter  2 


The  Analog  Hopfield  Model, 
and  Future  Directions  with 
Continuum  Models 


2.1  The  Analog  Hopfield  Model 

In  subsequent  articles  following  the  appearance  of  Hopfield’s  1982  arti¬ 
cle,  Hopfield  has  developed  the  idea  of  a  continuous  version  of  his  original 
model.  (£4]  [S*][  ,  ]  In  this  analog  version,  Hopfield  defines  u,  to  be  the 
input  to  ihe  ith  neuron  and  V<  =  ff(u,)  to  be  the  output  (g(ut)  is  a  sigmoid) 
of  the  ith  neuron.  The  analog  version  is  composed  of  an  R-C  network  con¬ 
structed  from  standard  op-amp’s,  ,^J  the  dynamics  of  which  are  governed 
by: 


N 


Y  wavi  +  7< 

y=i 


(2.1) 


(here,  J,  is  an  external  current,  which  is  just  an  offset).  As  in  the  digital 
version  of  the  model,  the  system  descends  monotonically  to  a  minimum  of 
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an  associated  Liapunov  function:  1 


£  =  Y,  mi  Mi  + 

£  •  J 

N  ,  .y.  N  » 

£(»-/  v:\v)iv -T,‘‘A  (2.2) 

i=l  HiJ°  fc!  I' 

Comparing  the  performance  of  the  analog  version  of  the  model  with  that 
of  the  digital  version,  Potter  [7]  has  found  that  the  analog  model  functions 
better  than  the  digital  model  in  its  ability  to  store  stable  stored  states  and 
do  error  correction.  Likewise,  Hopfield  [  JJ"]  has  found  that  his  analog 
model  arrives  at  vastly  better  solutions  of  the  travelling  salesman  problem. 
Still,  there  is  room  for  much  improvement. 


2.2  Alternative  Analog  Equations 

Given  the  limitations  of  the  Hopfield  model,  both  analog  and  digital,  we 
have  begun  exploring  the  problem  of  stable  stored  states  and  error-correction 
in  a  more  general  way  at  the  suggestion  of  Professor  Cooper.  As  a  starting 
place,  we  began  by  studying  equations  of  the  form:2 

~  =  -V„f(u;{u*»  (2.3) 

m 

with  F(u\  {u*})  =  (u  -  u*)2.  (2.4) 


In  the  simplest  situation,  in  one  dimension,  with  one  stored  state,  we 
have: 

^  =  (u-u1).  (2.5) 

The  solution  is  easily  found: 

u  =  u1  +  (u(0)  —  u^e-2'  (2.6) 

with  limt-> oo  u(t)  =  u1.  (2.7) 

There  is  only  one  fixed  point,  the  stored  state,  and  it  is  stable.  Furthermore, 
every  state  not  beginning  at  u1  converges  to  it  asymptotically. 

.  For  two  stored  states  in  one  dimension,  the  equation  of  motion 


CLU  .  «  .  ,  o.  . 

—  =  — 2(u  —  u)(u  -  u2)(u 

at 


u1  +  u2 


The  fixed  points  of  the  system  are  (u  =  u1),  (u  =  u2),  and  (u  =  !~-y-). 
Again  as  desired,  only  the  stored  states  themselves  are  stable  fixed  points, 
and  they  are  the  only  attractors,  partitioning  the  space  evenly.'  For  one- 
dimension,  with  three  or  more  stored  states,  the  original  form  of  F  leads 
to  an  equation  of  motion  with  a  non-uniform  partitioning  of  the  region  of 
attraction  of  the  stored  states.  For  example,  with  three  stored  states,  we 
have: 

~  =  -2(u  -  u‘)(u  -  u2)(u  -  u 3)  •  £  i(u  -  u’)(u  -  u3).  (2.9) 

ac  x/j  i 

2 This  form  was  suggested  by  Amir  Dembo  and  Ofer  Zeitouni,  Visiting  Assistant  Pro- 
fessors(Res.),  in  Applied  Mathematics,  Brown  University. 


Figure  2.2: 


Phase  diagram  for  two  stored  states  in  one  dimension. 


The  unequal  partitioning  comes  from  the  last  factor  involving  the 
which  is  an  interference  term  due  to  the  “interaction”  of  the  stored  states. 


This  observation  led  us  to  modify  the  original  form  of  F,  or  more  impor¬ 
tantly  VF,  so  that  the  domain  of  attraction  would  be  evenly  partitioned 
among  the  stored  states,  while  at  the  same  time,  so  that  the  stored  states 
would  remain  the  only  stable  fixed  points  and  attractors  in  the  system. 
The  beauty  of  the  result  we  have  obtained  below  is  that  its  properties  are 
independent  of  the  number,  density,  or  distribution  of  the  stored  states. 
The  form  of  VF  satisfying  the  above  requirements  is: 


VF  =  a(u  —  u1)  •  •  •  (u  —  um)(u  — 


u1  +  u2 , 


\um  1  +  ur 


(2.10) 


so  that  the  equation  of  motion  becomes: 

—  =  —  VF  =  —  a(u  —  u1)  •  •  •  (u  —  um)(u  — 
dt 


(a- 

.um"1  +  um 
2 


The  phase  diagram  for  this  is: 

If  we  now  consider  the  problem  in  higher  dimensions,  we  observe  that  if 
we  take  the  coordinates  to  be  independent  of  one  another  we  can  generate 
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Three  Dimensional  Examples  Comparing  the  Digital  Hopfield  Model 
with  Various  Algorithms  to  Improve  It 

In  this  appendix,  we  discuss  three  examples  of  the  Hopfield  model  in 
three  dimensions.  In  the  first  three  figures,  we  show  how  the  standard 
Hopfield  model  functions  in  each  of  the  three  examples.  Examples  2  and  3 
are  then  revisited  with  after  Potter’s  modification  has  been  applied  to  the 
synaptic  matrix.  Finally,  examples  2  and  3  are  studied  after  the  standard 
Hopfield  model  has  been  modified  by  choosing  biases  according  to  the  recipe 
that  we  specified  in  chapter  2,  and  the  results  are  contrasted  with  those  of 
Potter’s  algorithm  and  the  standard  model. 
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Figure  •2^fig  a.2B  tates  closer  together  in  the  Standard  Hopfield.  Spurious 
attractors  and  fixed  points  appear;  flows  are  no  longer  ideal.  There  is 
“noise”  due  to  state-state  interaction  and  state-error  interaction.  All 

Mi  mi  frj c«i  pbirt+i . 


Figure  .3(  fig  a.3)standard  IIopfietd4  model.  The  states  are  placed  ( 
closer  together.  Spurious  attractors  and  fixed  points.  Note  also,  as  in 
previous  figure,  merely  the  presence  of  an  energy  gap  is  not  enough  to  a 
a  flow  (see  point  F  in  this  figure  and  points  D  and  E  in  the  previous  figu 


Figure  .  ^fig  a.  4jEx.  2  revisited  witt^Potter’s  algorithm.  Improvements:  4 
flows  of  nearest  neighbors,  instead  of  just  2,  are  correct.  Spurious  attractors 
and  fixed  points  still  persist. 
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